UniMS: A Unified Framework for Multimodal Summarization with Knowledge Distillation
نویسندگان
چکیده
With the rapid increase of multimedia data, a large body literature has emerged to work on multimodal summarization, majority which target at refining salient information from textual and image modalities output pictorial summary with most relevant images. Existing methods mostly focus either extractive or abstractive summarization rely presence quality captions build references. We are first propose Unified framework for Multimodal Summarization grounding BART, UniMS, that integrates objectives, as well selecting output. Specially, we adopt knowledge distillation vision-language pretrained model improve selection, avoids any requirement existence captions. Besides, introduce visual guided decoder better integrate in guiding text generation. Results show our best achieves new state-of-the-art result large-scale benchmark dataset. The newly involved objective technique proven bring noticeable improvement task.
منابع مشابه
A unified framework for multimodal retrieval
In this paper, a unified framework for multimodal content retrieval is presented. The proposed framework supports retrieval of rich media objects as unified sets of different modalities (image, audio, 3D, video and text), by efficiently combining all monomodal heterogeneous similarities to a global one according to an automatic weighting scheme. Then, a multimodal space is constructed, to captu...
متن کاملTowards a Framework for Abstractive Summarization of Multimodal Documents
We propose a framework for generating an abstractive summary from a semantic model of a multimodal document. We discuss the type of model required, the means by which it can be constructed, how the content of the model is rated and selected, and the method of realizing novel sentences for the summary. To this end, we introduce a metric called information density used for gauging the importance ...
متن کاملA unified probabilistic generative framework for extractive spoken document summarization
In this paper, we consider extractive summarization of Chinese broadcast news speech. A unified probabilistic generative framework that combined the sentence generative probability and the sentence prior probability for sentence ranking was proposed. Each sentence of a spoken document to be summarized was treated as a probabilistic generative model for predicting the document. Two different mat...
متن کاملA Unified Framework for Video Summarization, Browsing and Retrieval
Video content can be accessed by using either a top-down approach or a bottom-up approach [1, 2, 3, 4]. The top-down approach, i.e. video browsing, is useful when we need to get an “essence” of the content. The bottom-up approach, i.e. video retrieval, is useful when we know exactly what we are looking for in the content, as shown in Fig. 1. In video summarization, what “essence” the summary sh...
متن کاملA Unified Submodular Framework for Multimodal IC Trojan Detection
This paper presents a unified formal framework for integrated circuits (IC) Trojan detection that can simultaneously employ multiple noninvasive measurement types. Hardware Trojans refer to modifications, alterations, or insertions to the original IC for adversarial purposes. The new framework formally defines the IC Trojan detection for each measurement type as an optimization problem and disc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i10.21431